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Summary 

Capture-recapture  models  are  widely  used  to  estimate  the  unknown  size  of  a  closed 
population,  N.  A  successful  strategy  for  exploiting  information  about  N  in  this  setting 
is  obtained  through  Bayesian  modelling,  as  shown  in  Castledine  (1981).  However,  direct 
Bayesian  approaches  are  often  cumbersome  to  implement  in  this  setting.  In  this  paper,  we 
show  how  Bayesian  sampling,  using  Gibbs  sampling  and  data  augmentation,  is  particularly 
well  suited  for  use  in  a  wide  variety  of  capture-recapture  models,  including  the  multinomial 
and  classical  hypergeometric  models.  This  approach  can  provide  accurate  approximations 
of  posterior  expressions,  including  the  entire  posterior  distribution,  /''j/ j 

Keywords:  Bayesian  computation,  data  augmentation,  Gibbs  sampling,  conjugate  priors,  catch 
propensities,  Petersen  estimator,  behavioral  and  temporal  models. 

1.  Introduction. 

A  common  experimental  setup  for  estimating  N,  the  unknown  size  of  a  closed  pop¬ 
ulation,  is  based  on  sampling  the  population  more  than  once,  paying  special  attention  to 
the  number  of  recaptured  individuals  (i.e.  those  that  appeared  in  more  than  one  sample). 
Such  setups  were  first  used  in  the  context  of  estimating  the  size  of  wildlife  populations, 
where  they  were  baptised  capture-recapture  models  (see  Seber  (1982)  for  an  overview  and 
bibliography).  This  setup  has  also  appeared  in  proofreading  problems  (Polya  (1976)), 
in  reliability  problems  in  manufacturing  quality  control  and  program  debugging  (Jewell 
(1985)  and  Nayak  (1988))  and  in  estimating  the  number  of  vital  human  events  (Mark, 
Setlzer  and  Kroti  (1974)).  A  recent  application  which  has  received  much  attention  is  the 
estimation  of  coverage  error  in  surve>s  and  censuses  (Wolter  (1986)). 

For  the  simple  case  of  drawing  two  samples  from  a  population  of  unknown  size  N , 
Wolter  (1986)  proposes  the  following  general  multinomial  setup.  He  shows  that  a  large 
variety  of  capture-recapture  models  in  the  literature  are  equivalent  to  multinomial  experi- 


1 


ments  where  the  ith  population  member  is  sampled  according  to  the  following  probabilities 

Sample  2 
in  out 

Sample  1  in  p\  1  p\2 

out  P21  P22 

1.  The  realisation  of  such  an  experiment  can  be  summarised 

Sample  2 
in  out 

Sample  1  in  n.n  nu 

OUt  Tin  ^22 

where  nn  +  nj2  +  ri2i  +  =  N.  Note  that  N  remains  unknown,  since  the  value  «22  is 

not  observed.  Nonetheless,  under  certain  assumptions  on  p,  the  vector  of  all  probabilities 
p'jk  above,  information  about  N  can  be  extracted  from  the  cell  counts  nn,  and  7121. 

Although  a  variety  of  frequentist  and  likelihood  approaches  for  making  inference  about 
N  have  appeared  in  the  literature  (see,  e.g.,  Bishop,  Fienberg  and  Holland  (1975,  Chapter 
6),  Burnham  et  al.  (1986),  Pickands  and  Raghavachari  (1989)  and  Huggins  (1989)),  we 
shall  focus  on  the  promising  Bayesian  approach  advocated  by  Castledine  (1981)),  Jewell 
(1985)  and  Smith  (1988).  In  the  capture-recapture  setting,  the  Bayesian  approach  to 
exploiting  information  about  N  proceeds  as  follows.  For  a  particular  capture-recapture 
setup,  the  joint  posterior  distribution  of  N  and  p  can  be  obtained  from  the  likelihood 
L{N,  p|data)  and  the  (possibly  improper)  prior  distribution  n{N,  p), 

7r(.N,pjdata)  a  L(jV,p|data)7r(.N,p).  (1.1) 


where  p\  1  +  p\2  +  P21  +  P22  = 
by  the  counts 


The  Bayes  estimator  for  N  is  then  the  (formal)  posterior  mean,  E[JV|data],  of  the  marginal 
distribution 

7r(TVjdata)  =  j  ir(N,  p|data)dp  (1.2) 

A  major  deficiency  of  this  approach  has  been  the  difficulty  of  calculating  the  posterior  mean 
and  other  posterior  quantities.  Unfortunately,  closed  form  expressions  for  the  marginal  in 
(1.2)  are  often  unavailable,  and  thus  approximate  methods  have  to  be  used. 

The  purpose  of  this  paper  is  to  show  how  Bayesian  sampling  is  a  promising  alterna¬ 
tive  to  both  analytical  calculation  and  numerical  approximation  in  the  Bayesian  analysis - — • 

of  capture-recapture  models.  The  essential  idea  behind  Bayesian  sampling  is  to  obtain  in-  ?0i! - - 

formation  about  marginal  posterior  distributions  by  indirect  sampling,  see  Robert  (1990).  -1  5^ 

In  the  capture-recapture  setting,  this  effectively  consists  of  obtaining  a  random  sample 

from  the  marginal,  l0B _ 

Nu...,Nn~n(N  |data),  (1.3)  - 

without  integrating  out  7*  in  (1.2).  By  taking  *  larg-  encugh  sample,  the  posterior  mean  - - 

or  any  other  posterior  quantity  can  then  be  estimated  to  the  desired  degree  of  accuracy.  loia/ - - 

ilty  Codes 

.Avail  and/or 

^  ',v  '  Jtst  I  Special 


□  □ 


Rather  than  sample  directly  from  the  marginal  posterior  7r(#|data),  Bayesian  sam¬ 
pling  exploits  the  conditional  distributions 

7r(JV|p,data)  and  :r(p|.#,data).  (1.4) 

This  can  be  done  by  using  a  special  case  of  Gibbs  sampling  as  well  as  data  augmentation, 
see  Gelfand  and  Smith  (1990)  or  Diebolt  and  Robert  (1990).  We  note  that  these  two 
approaches  are  not  the  same  in  general.  Starting  with  an  initial  value  for  N,  say  #J,  each 
sample  point  in  (1.3)  is  obtained  by  sampling  iteratively  from 

Nk  ~  *r(#jp*-i,data)  and  pfc  ~  7r(p|#£,data).  (1.5) 

It  follows  from  Diebolt  and  Robert  (1990)  that  the  distribution  of  N£  converges  to  7r(#| 
data)  as  k  goes  to  +oo.  Thus,  for  k  large  enough,  Ni  =  N£  is  effectively  an  observation 
from  7r(#|data).  By  repeating  this  procedure  n  times,  the  random  sample  in  (1.3)  is 
obtained. 

A  powerful  advantage  of  Bayesian  sampling  over  analytical  calculation  is  that  obtain¬ 
ing  the  random  sample  in  (1.3)  does  not  require  the  marginal  posterior  (1.2).  Instead,  all 
that  is  needed  are  the  conditional  distributions  in  (1.4).  As  will  be  seen  in  the  sequel, 
these  are  generally  easy  to  obtain  in  the  capture-recapture  setting.  Furthermore,  useful 
priors  can  be  chosen  so  that  these  conditional  distributions  will  be  of  standard  form,  allow¬ 
ing  for  fast  and  efficient  simulation  of  the  iterative  sampling  in  (1.5).  Another  advantage 
of  Bayesian  sampling  is  that  the  multinomial  model  above  can  be  made  to  subsume  the 
classical  hypergeometric  model  which  also  arises  in  this  setting  (see,  e.g.,  Darroch  (1958)) 
when  the  sample  sizes  are  fixed  (see  Section  4). 

This  plan  of  this  paper  is  as  follows.  In  Section  2,  we  present  different  models,  following 
the  classification  of  Wolter  (1986).  In  Section  3,  we  derive  the  associated  Bayesian  models 
and  illustrate  the  advantages  of  a  Bayesian  sampling  approach.  In  Section  4,  we  indicate 
how  our  methods  also  provide  a  convenient  solution  for  the  classical  hypergeometric  model. 
Section  5  studies  some  heterogeneous  extensions  and  gives  particular  attention  to  stratified 
models.  Section  6  extends,  through  an  example,  the  previous  results  to  a  multiple  recapture 
setting. 


2.  Some  capture-recapture  models. 

Consider  a  closed  population  of  unknown  size  N.  Two  random  samples  are  drawn 
consecutively  from  this  population.  Let  ni  and  n 2  be  the  sizes  of  these  two  samples,  with 
ni  =  nn  +  «i2  and  n 2  =  nn  +  021,  nn  being  the  size  of  the  intersection  of  the  two 
samples.  The  three  random  variables  nn,  nu  and  «2i  are  observed,  as  described  above. 
We  denote  by  n.  the  total  number  of  captures,  i.e.  n.  =  ni  +  «2,  and  by  n+  the  number 
of  distinct  captured  objects,  i.e.  n+  =  nn  +  nn  +  n2i.  In  the  cases  considered  below, 
nn,  n’2  and  n2i  are  sufficient  statistics.  Notice  that  here  it  is  not  necessary  to  know  the 
complete  ‘history’  of  each  captured  individual. 

This  setup  can  be  generalised  for  a  larger  number  of  recapture  events,  as  in  Castledine 
(1981)  or  Wolter  (1986).  However,  the  extension  to  these  cases  is  straightforward  and, 
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until  Section  6,  we  focus  on  the  single  recapture  case.  Following  Wolter  (1986),  we  present 
below  several  capture-recapture  models,  distinguished  by  different  assumptions  on  the 
probabilities  of  capture. 

2.1.  The  uniform  model  Mo.  For  the  model  M0,  each  individual  has  the  same  prob¬ 
ability  p  of  being  captured  in  the  first  or  the  second  sample.  The  likelihood  function  for 
this  model  is 


£o(JV,p|nu,ni2 


N 

nil  7*12  7*21 


-  p)2N-n 


The  maximum  likelihood  estimator  of  N  is  then 


where  [  .  J  denotes  the  integer  part  function. 

2.2.  The  behavioral  model  M &.  For  the  model  Mb,  the  probability  of  recapture  c  is 
different  from  the  probability  of  initial  capture,  p.  If  e  >  p,  the  individuals  are  said  to 
be  ‘trap-happy’  and  if  c  <  p,  the  individuals  are  ‘trap-shy’.  For  instance,  in  some  wildlife 
experiments,  captured  animals  are  often  less  likely  to  be  captured  a  second  time,  in  which 
case  c  <  p.  The  likelihood  function  for  this  model  is 


V*ll  7*12  7121  / 


N 

^7*ii  ni2  n2i 

and  the  maximum  likelihood  estimator  is 


Nb  = 


n+  ( 

1  (’*-'• 

L  \ 

7*1  /  J 

2.3.  The  temporal  model  Mt.  For  the  model  Mt,  the  probability  of  capture  for  the 
first  sample,  pi,  is  different  from  the  probability  of  capture  in  the  second  sample,  p2*  In 
this  case,  either  the  whole  population  is  affected  by  the  first  capture  or  the  conditions  of 
the  experiment  have  been  modified.  The  likelihood  function  for  this  model  is 

MJV,p,,p2|7*ii,ni2,n21)=  T  f  Wp?J(l (l~P2)*-ns 

\7*1 1  7112  7121  / 

and  the  maximum  likelihood  estimator  is 


Nt  = 


7*  1 72. 2 
.  Tin 


Wolter  (1986)  also  calls  Mf  the  Petersen  model,  because  the  maximum  likelihood  estimator 
agrees  with  the  maximum  likelihood  estimator  in  the  classical  Petersen  model  where  the 
sample  sizes  ni  and  n2  are  fixed  (see  Section  4). 
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More  complex  models  can  also  be  introduced.  For  instance,  the  combined  model  Mbt 
takes  into  account  a  change  between  the  two  captures  and  for  the  captured  objects.  Section 
5  deals  with  a  general  heterogeneous  model  and  Section  6  with  a  multiple  recapture  model. 

3.  Conditional  posterior  distributions. 

A  Bayesian  analysis  of  any  of  the  three  capture- recapture  setups  Afo,  Mb  or  Aft  de¬ 
scribed  in  Section  2,  would  proceed  by  multiplying  a  (possibly  improper)  prior  7r(JV,  p)  by 
the  corresponding  likelihhod  Lq,  Lb  or  Lt  to  obtain  the  posterior  distribution  ir(N,  p|data) 
as  in  (1.1).  In  this  section,  we  illustrate  how  for  certain  priors,  standard  conditional  poste¬ 
rior  distributions  7r(p|./V,data)  and  7r(JVjp,data)  are  obtained  which  allow  for  an  efficient 
sampling  simulation.  Thus,  repeated  iterative  sampling  from  these  standard  conditional 
posterior  distributions  as  in  (1.5)  is  a  readily  available  method  for  obtaining  a  random 
sample  from  the  marginal  posterior  ?r(AT|data).  The  Bayes  estimates  of  N,  IE[A/jdata],  or 
any  other  posterior  quantity  of  interest,  can  then  be  estimated  to  the  desired  degree  of 
accuracy.  We  also  show  that  in  these  cases,  the  marginal  posterior  distribution  7r(lV|data) 
is  analytically  unwieldy,  making  infeasible  the  alternative  of  ‘integrating  out’  p  in  (1.2). 

In  what  follows,  we  focus  on  priors  of  the  form 

*{N,p)  =  ff(lV)jr(p),  (3.1) 

where  7r(./V)  is  Poisson  Po( A).  Raftery  (1988),  in  the  related  problem  of  binomial  N 
estimation,  also  used  a  Poisson  prior  on  N.  However,  it  should  be  apparent  that  our 
developments  can  be  adapted  to  other  prior  distributions.  For  instance,  Castledine  (1981) 
used  an  improper  prior,  n(N)  a  l/N  (see  also  Section  6). 

For  a  prior  of  the  form  (3.1)  in  model  Afo,  the  joint  posterior  is 

*-(W,p|nn,n12,n2i)  a  j-^,pn q2N~n  n(p)n(N) 

a  (N-T7i<{pM''  ’lp)' 

where  q  =  1  —  p.  Therefore, 

7t(7V  —  n+|p,nn,ni2,n2i)  =  Po{q2X) 

n{p\N,nn,ni2tn2i)  a  pn  q2N~n  n(p). 

Depending  on  the  prior  distribution  *r(p),  we  may  use  Bayesian  sampling  or  integrate  out 
the  parameter  p  to  obtain  the  posterior  distribution  7r(jV|nn,ni2,n2i).  For  instance,  if 
7r(p)  is  8c(a,0),  the  marginal  posterior  distribution  is  very  cumbersome  to  compute  while 
Bayesian  sampling  is  straightforward.  Indeed,  we  have 

n„.n„.«2,)  <x  p-*"  -‘(1  -  p)*»-"  +>'ldp 

XN  B(a  +  n+,/7  +  2N  —  n.) 

a  (AT  -  n+)!  B(a,0) 

-  A*  (a  -t-  n+  —  1) . . .  a(0  +  2N  —  n.  —  1) . . .  /? 

~{N-n+) !  {a  + 0  +  2N -l)...{a  + 0) 


5 


while 


ir(N  -  n+|p,nn,ni2,n2i)  =  ?o(q2\) 

7r(p|-W>nii»ni2«n2i)  =  Be(a  +  n.,0  +  2N  —  n.). 

Other  distributions  on  N  do  not  modify  the  complexity  of  the  posterior  distribution  (except 
if  they  have  small  finite  support). 

For  model  M&,  we  consider  the  special  case  of  (3.1)  where  ir(N,p)  =  7r(p)7r(e)7r(./V) 
and  7r(jV)  is  Poisson  Po( A).  The  joint  posterior  here  is 

Nt 

7r(jV,p,e|nu,n12,n21)  «  ^  j,pw*  eW|>  (!  -  c)n*1  q2N~n  ir(p)n(c)n{N) 

Therefore, 

■k{N  -  n+|p,nn,ni2,n2i)  =  Po(q2X) 

Tr{p\N,nn,ni2,n2i)  <x  pn+q2N~n  n(p) 
jr(c|nn,ni2,n21)  «  c”M(l  -  c)na,7r(c). 

Once  again,  in  the  case  where  the  prior  distributions  on  p  and  c  are  beta  distributions,  a 
good  approximation  to  the  posterior  distribution  of  N  is  provided  by  Bayesian  sampling, 
while  a  direct  approach  faces  the  same  computational  problems  as  for  the  model  Mo. 

For  model  Mt,  we  consider  the  special  case  of  (3.1)  where  n(N,  p)  =  7r(pi)7r(p2)7r(A0 
and  7r (jV)  is  Poisson  Po(A).  The  joint  posterior  here  is 


7r(iV,pi,p2|nn,nl2,n2i)  a  (jy  I  n+)\P" ‘P^C1  ~  Pl)*  n'(1-P2)*  "MPi^^M^) 


Therefore, 

*{N  -  n+|p,nn,n12,n21)  =  Po([  1  -  pi)(l  -  p2)A) 

ff(pi|^,nUlni2,n21)  a  p?1  (1  -  Pi)N~n'  *r(pi) 

jr(p2|iV,nii,ni2,n2i)  oc  p2s(l  -  p2)N~n’ x(p2). 

The  same  remarks  as  in  the  previous  models  apply  here. 

The  prior  distributions  used  above  require  prior  information  about  the  parameters  a, 
0  and  A.  In  the  absence  of  prior  information,  we  suggest  a  =  /?  =  1  and  either  using 
7 r(N)  =  L/N  (see  Castledine  (1981)),  which  corresponds  to  the  Jeffreys  prior  tt(A)  = 
A-1,  or  replacing  A  by  the  maximum  likelihood  estimator  of  the  appropriate  model,  thus 
advocating  a  pseudo-empirical  Bayes  approach.  Another  alternative  is  developed  in  the 
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next  section,  where  we  show  that  the  classical  hypergeometric  model  can  be  represented 
as  a  limiting  case  of  the  temporal  model. 


4.  Hyper  geometric  model- 

We  have  been  dealing  until  now  with  the  binomial  model,  where  the  two  sample  sizes 
are  supposed  to  be  random.  Although  adequate  for  many  practical  situations,  a  simpler 
model  has  also  been  studied  in  the  literature,  where  the  two  sample  sizes  are  fixed  (or 
the  model  is  conditional  on  these  two  sample  sizes).  It  is  called  the  Petersen  model  or 
the  Darroch  model  (see  Darroch  (1958),  Casteldine  (1981),  Seber  (1982)).  The  sample 
distribution  of  nu  is  given  by 

f(nn\N)  =  (4.1) 


It  is  then  obvious  that  a  direct  Bayesian  approach  will  lead  to  rather  complicated  com¬ 
putations,  except  in  the  particular  case  of  bounded  uniform  priors.  We  show  below  that 
Bayesian  sampling  allows  for  a  much  more  efficient  treatment. 


Starting  back  with  the  temporal  model  Mt,  we  see  that 
-r(JV|p„Pl,n+)  «  (^V,  +  '(>  - 


where  n  =  1  -  (1  -  Pi)(l  -  P2).  In  particular,  if  ir(N)  =  1,  the  posterior  distribution  of  N 
is  Meg(n+,n)-  If,  in  addition,  the  prior  distribution  on  pi  and  p2  is  Sc(a,  /?),  the  marginal 
posterior  distribution  satisfies 


7r(.N|n1,n2,n+)  a 


Nl  (N-rn  +  (3-  1)!  {N-n2  +  0-  1)! 
[N  -  n+)!  (N  +  a  +  /?  -  1)!  {N  +  a  +  /?  -  l)! 

N\  (JV-ni)!(AT-n2)! 

(N  —  n+)!  N'.  N\ 


a 


c, 

O 


if  a  =  0  and  (3=1.  Therefore,  the  hypergeometric  model  can  be  rewritten  as  a  Mt  model 
with  improper  prior  distributions  7r(iV)  =  1  and  7r(p)  =  1/p,  a  fact  noted  by  Castledine 
(1981).  Although  these  priors  may  not  really  correspond  to  a  true  prior  opinion,  combining 
the  hierarchical  decomposition  of  (4.1)  with  Bayesian  sampling  provides  a  very  efficient 
tool  for  computing  the  Bayes  estimators,  since  the  conditional  distributions  for  p\  and  p2, 


7r(p,|iV,ni)  oc  Be(rii,N  +  1  -  n,),  (*  =  1,2) 

are  as  easy  to  simulate  as  Meg(n+,fi).  Note  that  the  temporal  aspect  of  the  model  is 
absent  from  the  original  formulation,  as  well  as  the  probability  p.  In  this  case,  Bayesian 
sampling  makes  use  of  the  hierarchical  representation  to  approximate  the  Bayes  estimator, 
even  though  it  does  not  necessarily  correspond  to  a  true  state  of  Nature.  (See  Robert 
(1990)  for  additional  comments  on  the  utilisation  of  mixture  representations  by  Bayesian 
sampling.) 
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5.  Heterogeneous  extensions. 

As  considered  in  Burnham  and  Overton  (1978),  Castledine  (1981)  and  Huggins  (1989), 
a  more  general  model  can  obviously  be  introduced,  namely  that  the  probability  is  different 
for  each  individual  (*)  and  each  capture  (j)  and  denoted  by  p»,-.  The  likelihood  is  then 

t=i,=i 


where  A  =  (du>* . .  ,djv2)  is  the  vector  of  the  capture  indicators  for  all  members  of  the 
population  and  each  capture.  By  convention,  the  first  members  are  the  captured  members. 
In  this  more  general  context,  their  ‘history’  obviously  counts. 


5.1.  Bayesian  analysis.  From  a  pure  Bayesian  point  of  view,  we  can  estimate  the 
parameters  of  the  model  if  the  available  prior  information  is  sufficient.  For  instance,  if  the 
priors  on  the  probabilities  p,y  are  all  beta  Be(ay0)  priors,  the  marginal  distribution  of  A 
given  N  is 


/(A|JV) 


=nn/o 

t=l ;  =  1 


1  (i  py*'-1 


s(«,0) 


dpi. 


i=i y=i 


B(a  4-  Sj  j,  0  +  1  —  6jj) 
B(a,0) 


-nnM 

-  (=h)'  (=£?) 


) 


2N-n 


Therefore,  the  information  about  the  size  of  the  intersection  between  the  two  samples  is 
not  used  by  the  likelihood.  This  result  is  not  very  surprising  since  the  probabilities  vary 
between  the  two  captures.  If  the  prior  on  N  is  again  a  Poisson  distribution  Po( A),  the 
posterior  distribution  of  (N  —  n.)  is 


Here  the  posterior  distribution  is  directly  available  and  there  is  no  need  to  call  for  Bayesian 
sampling.  Note  also  that  the  marginal  distribution  of  n.  given  N  is  constant  when  a  =  0. 

In  the  case  when  the  probability  is  the  same  for  each  capture  and  p,  —  8e(a,0),  the 
marginal  distribution  of  A  given  N  is 


n  =  TT  f1  nStii-gj)  U  ~  SiPi+S'~ldpi 

B{a,0) 


aw) = n  /o  2 

B(a  +  0  +  2  —  Si) 


i=i 

N 

=n 

«=i 


B(a,0) 
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where  6,  =  <5,i  +  <5,2-  Therefore,  the  distribution  of  (nn,ni2,n2i)  given  N  is 

\|»11  «12  nu)  [(a  +  /?)(a  +  /?  +  1)]2JV 

In  this  case,  the  posterior  distribution  of  (N  —  n+)  is 


Po 


+  D 


,(.-*  +  /?)  (a +  /3+1) 


thus  depends  only  on  the  ‘effective’  size  of  the  double  sample,  n+.  If  a  =  0  =  1,  the 
density  of  (nn,ni2,n2i)  is  again  constant. 

These  two  results  show  that  a  totally  heterogeneous  model  actually  brings  very  little 
information  on  the  population  size  N  since  too  many  parameters  have  to  be  controlled  at 
the  same  time.  In  addition,  the  fact  that  the  probabilities  are  all  different  calls  for  many 
recaptures,  as  it  is  the  case  in  Huggins  (1988).  This  is  also  an  example  of  a  situation  where 
Bayesian  sampling  is  of  no  use,  since  :he  marginal  posterior  distribution  of  N  is  much 
more  straightforward  than  the  conditional  posterior. 

5.2.  Known  strata.  A  more  amenable  case  of  heterogeneity  occurs  when  the  population 
is  divided  into  known  strata  such  as  male/female ,  young /adult /senior,  etc...  This  type  of 
division  usually  occurs  in  surveys  and  censuses.  We  can  then  use  the  approach  of  Section 
3  according  to  two  scenarios: 

(i)  The  strata  are  independent  and  each  stratum  size  N,  (s  =  1,...,S)  has  a  prior 
distribution  Po( A,).  We  have  then  S  independent  replications  of  the  model  considered 
in  Section  3.  Bayesian  sampling  provides  an  easy  approximation  of  the  posterior 
distributions  of  JVj, . . . ,  Ns  and  thus  of  the  posterior  distribution  of  N  =  N\+. .  .+Ns- 

(ii)  The  population  size  N  has  a  prior  distribution  Po{\)  and  the  strata  sizes  (N i , . . . ,  Ns) 
follows  a  multinomial  distribution  Ms{N‘,u>i, . . .  ,(*>5).  This  assumption  is  often  jus¬ 
tified  in  the  case  of  surveys  where  the  proportions  of  the  subpopulations  are  rather 
well-known.  However,  when  N  is  integrated  out,  this  model  appears  jus  a  special  case 
of  (i),  since 

N,  ~  Po(\ut). 


In  both  cases,  we  can  see  that  Bayesian  sampling  allows  for  an  easy  computation  of  the 
strata  sizes  estimators. 

5.3.  Unknown  strata.  It  may  also  happen  that  the  population  is  divided  into  5  strata 
which  are  impossible  to  detect,  even  after  individuals  have  been  captured.  For  instance, 
this  occurs  when  a  part  of  the  population  has  an  undetected  disease  (e.g.,  seropositivity) 
which  modifies  its  behaviour.  Therefore,  the  probability  of  capture  of  a  given  member  of  the 
population,  p,,  is  one  of  the  probabilities  Xj, ...  ,^5  for  the  different  strata,  with  probability 
(1  ^  5  <  $)  corresponding  to  the  proportion  of  the  stratum  in  the  whole  population. 
We  can  model  the  probabilities  irt  as  beta  distributions,  5e(a#,/?,)  (1  <  s  <  5).  For  any 
prior  distribution  on  N ,  it  is  then  easy  to  see  that  x(IV|data)  cannot  be  used  in  practice, 
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even  though  it  may  be  expressed  in  a  closed  form  for  some  priors.  This  is  often  the  case 
with  Bayesian  analyses  of  mixture  models,  since  all  the  partitions  of  the  sample  have  to 
be  considered  (see  Diebolt  and  Robert  (1990)). 

Consider  a  Poisson  distribution  on  N  associated  with  the  previous  model.  We  then 
introduce  an  auxiliary  parameter  to  implement  the  Bayesb.n  sampling  approach.  Let  2,, 
(l  <  i  <  N,  1  <  s  <  S)  denote  the  indicator  function  I{p,  =  x,}.  Therefore, 

(^■*1 »  •  »  •  y  Z\S  )  ^1 S  (l  1  ^1  y  •  ■  •  y  ^ S  ) 

and,  conditionally  on  2,  =  (2,1, ... ,  2,5), 


Pi 


s 


t=  1 


(i.e.  2,  completely  determines  p,).  It  is  easy  to  see  that  the  simulation  steps  for  Bayesian 
sampling  are  then 

1.  irt  ~  ty  data)  (s=l,...,5) 

2.  2,  ~  7r(2t|.(V, 7r,data)  (i  =  l,...,JV) 

3.  TV  ~  7r(./V|7r,z,data), 


where 


^(tt«  1-^,2,  data)  =  Bt{a,  +  mt,0,  +  2 n,  -  m,), 
7r(2,|Ar,  7r ,  data)  =  .Ms  1;  1  *  - - J - ~ 


7r(iV|7T,  z,  data)  =  Po  (A(l  -  7r1)’’1  ...  (1  -  7Ts)rs) 


and 


N 


ff 


m>  =  Y]  Zi*6*y  n,  =  Zi„  T,  =  n,/N. 


>=i 


is  I 


The  variables  n„,  m„  and  rt  arc  upda«._d  at  step  2. 

This  particular  case  provides  a  strong  argument  in  favor  of  Bayesian  sampling  since 
it  appears  to  be  the  only  way  to  study  this  intricated  setup.  A  non-Bayesian  approach 
cannot  handle  all  the  parameters  and  a  formal  Bayesian  approach  requires  an  enormous 
amount  of  computing  time. 


6.  A  multiple  recapture  example. 

In  this  section,  we  briefly  illustrate  the  extension  of  our  techniques  to  the  multiple 
recapture  setup  by  application  to  a  real  data  set.  The  data  set  we  consider  is  from  Castle- 
dine  (1981)  and  appears  in  Table  1.  It  consists  of  14  capture  events  from  a  population  of 
sunfish.  At  the  xth  capture,  n,  fishes  are  captured,  out  of  which  m,  have  been  previously 
captured.  Thus,  n+  =  E,  (nt  -  mi)  =  138  is  the  total  number  of  different  fish  captured. 
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i 

1 

2 

3 

4 

5 

6 

7 

10 

27 

17 

7 

1 

5 

6 

m, 

0 

0 

0 

0 

0 

0 

2 

i 

8 

9 

10 

11 

12 

13 

14 

n. 

15 

9 

18 

16 

5 

7 

19 

1 

5 

5 

4 

2 

2 

3 

Table  1.  Multiple  recapture  data  for  a  population  of  sunfish. 

Following  Castledine  (1981),  we  consider  a  temporal  model  Mt  with  s  =  14  capture 
episodes.  We  put  the  same  prior  Be(a,0)  on  each  of  the  capture  probabilities  pi,...,p, 
and  use  the  prior  n (N)  oc  1/N.  Analogous  to  the  developments  in  Sections  2  and  3,  it  is 
straightforward  to  show  that  the  conditional  distributions  are 

7r(iV|pi ,  ...,p,,  data)  =  Meg(n+  -  l,p),  (6.1) 


with  M  =  1  —  (1  —  Pi) ...  (1  —  p.),  and  (t  =  l,...,s) 

7r(pt | TV, data)  =  Be(a  +  +  N  -  nt).  (6.2) 

We  employed  Bayesian  sampling  to  obtain  the  posterior  quantities  listed  in  Table  2 
and  the  posterior  histograms  in  Figure  1.  This  was  done  as  follows.  For  each  of  seven 
choices  of  Beta  parameters  (a,/?),  we  simulated  a  random  sample  of  size  n  =  1000  from 
the  marginal  posterior  as  in  (1.3).  We  iteratively  sampled  as  in  (1.5)  from  (6.1)  and  (6.2) 
a  total  of  fifty  times  to  obtain  each  sampled  observation.  (The  simulations  were  performed 
with  the  IMSL  routines  RNNBN  and  DRNBET.)  The  mle  for  N,  here  460,  was  used  as 
the  starting  value  Nq.  Finally,  the  posterior  quantities  were  estimated  by  their  sample 
moments. 


Prior 

Posterior 

Posterior 

Posterior 

95% 

credible 

95% 

credible 

parameters 

of  NW 

of  log (N)W 

of  log(JV)(3) 

interval 

interval 

a  0 

Mean  St. Dev. 

Mean  St.Dev. 

Mean  St.Dev. 

for  N 

for  N  W 

0  1 

448  84 

6.09  0.18 

6.15  0.18 

312  -  650 

322  -  656 

2  100 

506  70 

6.21  0.14 

6.21  0.14 

378  -  663 

393  -  676 

3  100 

419  49 

6.03  0.12 

6.05  0.12 

332  -  520 

335  -  539 

10  500 

548  54 

6.30  0.10 

6.31  0.10 

454  -  664 

454  -  671 

15  500 

408  37 

6.01  0.09 

6.02  0.09 

342  -  485 

347  -  489 

20  1000 

548  49 

6.32  0.09 

6.33  0.09 

471  -  665 

470  -  666 

30  1000 

406  32 

6.01  0.08 

6.01  0.08 

348  -  475 

350  -  478 

Table  2.  Comparison  between  Bayesian  sampling  approximation  ^ 
and  the  normal  approximation  in  Castledine  (1981) 
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F  igure  1  should  appear  around  here 


We  have  also  included  in  Table  2  the  estimates  of  Castledine  (1981)  who  performed  the 
same  analysis,  but  used  a  normal  approximation  to  the  posterior  distribution  of  log(JV), 
to  obtain  the  required  quantities.  The  agreement  between  our  numbers  and  his  shows  that 
Castledine’s  approximations  performed  remarkably  well  in  most  of  the  cases.  Notice  that 
the  greatest  disagreement  between  the  two  approaches  occurs  for  small  values  of  a  and  /?, 
especially  for  a  =  0,  /?  =  1,  when  prior  information  is  weakest.  One  can  see  in  Figure  1  that, 
in  this  case,  the  skewness  of  the  posterior  distribution  of  N  is  not  entirely  removed  by  the 
log  transformation  so  that  Castledine  approximation  will  be  off,  especially  for  computing 
the  interval  estimates.  An  important  justification  of  Bayesian  sampling  techniques  is  that 
they  can  circumvent  the  problems  of  inaccurate  normal  approximations,  as  pointed  out  in 
Tanner  and  Wong  (1987).  Finally,  note  that,  in  this  analysis,  the  estimates  of  N  seem  to 
be  sensitive  to  the  choice  of  the  Beta  prior.  Thus,  it  is  probably  most  reasonable  to  use 
the  estimates  for  the  choice  (a,/3)  =  (0, 1)  which,  except  for  the  slightly  different  prior  on 
N,  yields  the  hypergeometric  case  discussed  in  Section  4. 

7.  Conclusion 

This  paper  has  shown  that  in  a  large  variety  of  capture-recapture  models,  Bayesian 
sampling  can  provide  a  fast  and  efficient  approach  to  obtaining  posterior  information  in 
Bayes  analyses.  The  particular  models  we  considered  were  meant  to  illustrate  the  gen¬ 
eral  ideas  for  implementing  Bayesian  sampling  in  this  context.  Of  course,  our  treatment 
was  by  no  means  exhaustive,  as  there  are  many  other  variants  and  generalisations  where 
Bayesian  sampling  should  also  be  fruitful.  For  instance,  it  would  be  interesting  to  apply 
this  approach  to  the  analysis  of  open  population  extensions  which  take  into  account  the 
deaths  and  immigrations  which  actually  occur  in  wildlife  populations.  Another  variation, 
studied  by  Seber  and  Felton  (1981),  is  to  consider  tag-loss ,  namely  the  misclassification 
of  recaptured  objects  as  newly  captured  objects.  This  last  variant  may  also  be  used  in 
epidemiology  when  the  proportion  of  people  in  each  stratum  varies  between  the  captures. 
Another  possible  extension  deals  with  the  estimation  of  the  number  of  species,  as  con¬ 
sidered  in  Efron  and  Thisted  (1976).  The  key  to  the  availability  of  Bayesian  sampling 
methods  in  all  of  these  extensions  is  the  availability  of  conditional  distributions  for  which 
easy  computing  is  practically  possible. 

Bayesian  sampling  is  not  a  new  inferential  procedure  but  rather  an  approach  for  fa¬ 
cilitating  Bayesian  inference,  i.e.  a  powerful  tool  at  the  end  of  the  Bayesian  processing 
chain.  If  a  different  inferential  approach  is  desired,  then  Bayesian  sampling  will  be  irrel¬ 
evant.  The  main  criticism  of  Bayesian  inference  is  its  requirement  of  prior  input.  If  such 
prior  information  is  available,  and  it  will  be  in  some  analyses,  we  believe  a  direct  Bayesian 
approach  is  reasonable.  However,  when  little  prior  information  is  available,  we  would  rec¬ 
ommend  robust  Bayesian  methods.  It  turns  out  that,  in  capture-recapture  settings,  such 
robust  approaches  are  indeed  possible.  These  can  be  obtained  by  the  hierarchical  and  em¬ 
pirical  Bayesian  approaches  discussed  in  Sections  3  and  4.  Fortunately,  powerful  Bayesian 
sampling  methods  are  also  readily  available  in  these  cases. 
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Figure  1  -  Posterior  histograms  for  S  and  log(AT)  and  (a,/3)  =  (0,1). 
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